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,^ • Abstract 

^. 

^f\ , In this paper, we study the classical problem of estimating the proportion of a finite 

population. First, we consider a fixed sample size method and derive an explicit sample size 

formula which ensures a mixed criterion of absolute and relative errors. Second, we consider an 

^0 ' inverse sampling scheme such that the sampling is continue until the number of units having 

r^ . a certain attribute reaches a threshold value or the whole population is examined. We have 

C^ , established a simple method to determine the threshold so that a prescribed relative precision 

is guaranteed. Finally, we develop a multistage sampling scheme for constructing fixed-width 

confidence interval for the proportion of a finite population. Powerful computational techniques 

C^ ' arc introduced to make it possible that the fixed-width confidence interval ensures prescribed 

J!L. . level of coverage probability. 

m ; 1 Fixed Sample Size Method 

^rz • The estimation of the proportion of a finite population is a basic and very important problem in 

^D ■ probability and statistics [6l [8] . Such problem finds applications spanning many areas of sciences 

and engineering. The problem is formulated as follows. 
^ ' Consider a finite population of N units, among which there are M units having a certain 

C^ ■ attribute. The objective is to estimate the proportion p = ^ based on sampling without replace- 

ment. 

One popular method of sampling is to draw n units without replacement from the population 
and count the number, k, of units having the attribute. Then, the estimate of the proportion is 
taken as p = -. In this process, the sample size n is fixed. 

Clearly, the random variable k possesses a hypergeometric distribution. The reliability of the 
estimator p = - depends on n. For error control purpose, we are interested in a crucial question 
as follows: 
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For prescribed margin of absolute error Sa € (0,1), margin of relative error e^ G (0,1), and 
confidence parameter 5 G (0, 1), how large the sample size n should be to guarantee 



Vv l\p — p\ < Ea or 



P-P 



<er\>l-51 (1) 



p 

In this regard, we have 

Theorem 1 Let Ea € (0, 1) and Er € (0, 1) be real numbers such that — + Ea < 2' '^hen, |ip is 
guaranteed provided that 

n> '- ^. (2) 

{Ea + EaEr) ln(l + £r) + (^r - £a - ^aGr) In 1 



The proof of Theorem 1 is given in Appendix A. It should be noted that conventional meth- 
ods for determining sample sizes are based on normal approximation, see [6] and the references 
therein. In contrast. Theorem 1 offers a rigorous method for determining sample sizes. To re- 
duce conservativeness, a numerical approach has been developed by Chen [1] which permits exact 
computation of the minimum sample size. 

2 Inverse Sampling of Finite Population 

To estimate the proportion p, a frequently-used sampling method is the inverse sampling scheme 
described as follows: 

Continuing sampling from the population (without replacement) until r units found to carry 
the attribute or the number of sample size n reaches the population size A^. The estimator of 
the proportion p is taken as the ratio p = —, where k is the number of units having the attribute 
among the n units. 

Clearly, the reliability of the estimator p depends on the threshold value r. Hence, we are 
interested in a crucial question as follows: 

For prescribed margin of relative error e G (0, 1) and confidence parameter 6 G (0, 1), how 
large the threshold r should be to guarantee 

Pr{\p — p\ < Ep} > 1 — (5? 

For this purpose, we have 
Theorem 2 For any e G (0, 1), 

Ft{\p — p\ > Ep} < J2{e, r) 



where 



^(e,r) = (l + e)-'^exp( -^^ ) + (1 - e)-'^exp ( — ^'' 



l + ej \ 1-e, 

which is monotonically decreasing with respect to r. Moreover, for any 5 G (0, 1), there exists a 
unique number r* such that cS(e,r*) = 5 and 

(l+£)lni (l-^)ln| 1<^*< (l + ^)ln| 



(l+£)ln(l + e)-e' (1 - e) ln(l - e) + e j (1 + e) ln(l + e) - e' 

The proof of Theorem 2 is given in Appendix B. As an immediate consequence of Theorem 2, 
we have 

Corollary 1 Let e, 6 € (0, 1). Then, Pr {|p — p| < ep} > 1 — 6 provided that 

(3) 



r > 



(l + e)ln| 



;i + e) hi(l + e)-e 



3 Multistage Fixed-width Confidence Intervals 

So far we have only considered point estimation for the proportion p. Interval estimation is also 
an important method for estimating p. Motivated by the fact that a confidence interval must be 
sufficiently narrow to be useful, we shall develop a multistage sampling scheme for constructing a 
fixed-width confidence interval for the proportion, p, of the finite population discussed in previous 
sections. 

Note that the procedure of sampling without replacement can be precisely described as follows: 
Each time a single unit is drawn without replacement from the remaining population so that 
every unit of the remaining population has equal chance of being selected. 

Such a sampling process can be exactly characterized by random variables Xi, • • • , X^r defined 
in a probability space (Q, ^, Pr) such that Xi denotes the characteristics of the i-th sample in 
the sense that Xj = 1 if the i-th sample has the attribute and Xi = otherwise. By the nature 
of the sampling procedure, it can be shown that 

M \/ N -M \ l\[ n \fN' 



-rrjAj — Xi, I — -L, • ■ ■ , n\ — J „ I 1 „ 



for any n S {1, • ■ ■ , N} and any Xi E {0, 1}, i = 1, • • • , n. Based on random variables Xi, • • • , X]\f, 
we can define a multistage sampling scheme of the following basic structure. The sampling process 
is divided into s stages with sample sizes ni < n2 < ■ ■ ■ < Ug. The continuation or termination 
of sampling is determined by decision variables. For each stage with index i, a decision variable 
Di = S>i{Xi, • • • , Xrn) is defined based on random variables Xi,- ■ ■ , Xr^- The decision variable 
-D^ assumes only two possible values 0, 1 with the notion that the sampling is continued until 
-D^ = 1 for some I € {1, • ■ ■ ,s}. Since the sampling must be terminated at or before the s-th 
stage, it is required that Dg = 1. For simplicity of notations, we also define -D^ = for £ = 0. 



Our goal is to construct a fixed-width confidence interval (L, U) such that U — L < 2e and 
that Ftl{L < p < U \ p} > 1 — 6 for any p G {^^ : < z < A^} with prescribed e € (0, |) and 
5 € (0, 1). Toward this goal, we need to define some multivariate functions as follows. 

For a € (0,1) and integers < A; < n < A^, let C{N,n,k,a) be the smallest integer Mi 
such that Er=/=(*^0(^«lYO/(«) > f- Let U{N,n,k,a) be the largest integer M„ such that 
E!Lo(^{")(^„'-/i/(^) > f- Let nmax(A^,a) be the smallest number n such that U{N,n,k,a) - 
C{N, n, k, a) < 2eN for < k < n. Let n^i^{N, a) be the largest number n such that U{N, n, /c, a) — 
C{N, n, k, a) > 2eN for < /c < n. 

Theorem 3 Let C > and p > 0. Let ni < n2 < • ■ ■ < rig be the ascending arrangement of all dis- 



tinct elements of 



nmin(A'',C<5) 



.{NXS) 



0, 1, ■ • • ,T > with 



, ^j^rWx(JV^ 



For 



ln(l+p) ^^^ n„i„(W,C<5) 

i = 1,- ■ ■ ,s, define Kg = Yl7=i -^i ^'^^ ^i such that Dg = 1 ifU{N, ng, Kg, (6)—C{N, ni, Kg, (6) < 
2eN; and Di = otherwise. Suppose the stopping rule is that sampling is continued until Di = 1 
for some £ € {1, • • • , s}. Define L = j^ x C{N,n, J27=i Xi, (S) andU ^ j^ xU {N, n, '£7=i X^,CS), 
where n is the sample size when the sampling is terminated. Then, a sufficient condition to 
guarantee Pr {L <p<U\p}>l — 6 for any p (z {^ : < i < N} is that 

s 

J2 [P^mN, He, Ke, (S) > M, D,^^ = 0, D, = 1 | M} 

f.=i 

+ Fy{U{N, m, Ke, CS) < M, £>,_! = 0, £>, = 1 | M}] < 5 (4) 

for all M G {0, 1, • • • , N}, where ^ is satisfied if ( > is sufficiently small. 

It should be noted that Theorem 3 has employed the double-decision-variable method recently 
proposed by Chen in [1]. To further reduce computational complexity, the techniques of bisection 
confidence tuning and domain truncation developed in [H [2] can be very useful. 

A Proof of Theorem 1 

To prove the theorem, we shall introduce function 

p \ — p 

g{e, p) = {p + s) In h (1 — p — e) In ■ 



p + e 1 — p — e 

where < e < 1 — p. We need some preliminary results. 
The following lemma is due to Hoeffding [7j. 

Lemma 1 

Fv{p > p + e} < ex];){n g{e,p)) for 0<e<l— p<l, 

Pr{p < p — e} < exp(n g{—£,p)) for < e < p < 1. 
The following Lemmas 2-4 have been established in [3]. 



Lemma 2 Let < e < 2- Then, g{£,p) is monotonically increasing with respective to p £ 
(0, 2 — e) and monotonically decreasing with respective to p G (2,1 — s). Similarly, g{—e,p) is 
monotonically increasing with respective to p G (e, |) and monotonically decreasing with respective 
tope (i+e, 1). 



Lemma 3 Let < e < ^. Then, 



g{£,p) > g{-£,p) Vpe ( e, - 



g{e,p) < g{-e,p) VpG(-,l-e). 



Lemma 4 Let < s < 1. Then, g {sp,p) is monotonically decreasing with respect to p € (0, j^ j . 
Similarly, g {—ep,p) is monotonically decreasing with respect to p £ (0, 1). 

Lemma 5 Suppose < e^ < 1 and < |^ + Eq < | . Then, 

Pr{p < p - Sa} < exp f n 5 f -£„, — j j (5) 

forO<p< ^ . 

Proof. We shall show (jS]) by investigating three cases as follows. In the case of p < £a, it is 
clear that 

Pr{p < p - £a] = Q < ayiV [n g { -£a, — 

In the case of p = Gq, we have 



Pr{p <p-£a} = Pr{p = 0} = Pr{k = 0} 

lN~M\ 



n , < 'N-M 



(i-pr = (i-6,r 

lim (iyiY>{n g{-£a,p)) 



< exp ( n 5f ( -Ea, — 



where the last inequality follows from Lemma [2] and the fact that £a < — ^ 2 ~ ^a- 
In the case of £« < p < f^ , we have 

Pr{p <p-ea} < exp{n g{-ea,p)) < exp (n g f -e^, — j j , 

where the first inequality follows from Lemma [1] and the second inequality follows from Lemma [2] 
and the fact that £a < f^ < h — ^a- So, ([5]) is established. □ 



Lemma 6 Suppose < e^- < 1 and < |^ + Eq < | . Then, 

Pr{p>{l + er)p}<expingi£a, — jj (6) 

for ^ <p <1. 

Proof. We shall show ([6]) by investigating three cases as follows. In the case of p > ^ , ^ , it is 
clear that 

Pr{p > (1 + er)p} = < exp f n 5f f e„, -2^ j j . 

In the case of p = -r-^ , we have 

Pr{p > (1 + er)p} = Pr{p = 1} = Pr{k = n} 



1 + er 
lim exp(n (7(ej.p,p)) 



V^Th. 



< exp [n g [Ea,— 



where the last inequality follows from Lemma [4] and the fact that |^ < 2 i+g < TT^ ^^ ^ result 
of 0< f^+Sa < i 

In the case of — < p < -n — , we have 

Pr{p < {l + er)p} < exp{n g{£rP,p)) < exp (n g (ea, — j j , 

where the first inequality follows from Lemma [T] and the second inequality follows from Lemma 
H So, dH) is estabhshed. □ 

We are now in a position to prove the theorem. We shall assume ^ is satisfied and show that 
([T|) is true. It suffices to show that 

Pr{|p -p\> ea, \p-p\ > Srp} < S. 

For < p < — , we have 

£r 

Pl{\p - p\ > Ea, \p - p\ > SrP} = Pr{|p-p|>ea} 

= Pr{p>p + £a} + Fr{p<p-£a}. (7) 

Noting that 0<p + ea<|^ + ea<^, we have 

Pr{p >p + ea}< exp{ng{ea,p)) < exp f n g f e^, — j j , 



where the first inequahty follows from Lemma [T] and the second inequality follows from Lemma 
[21 It can be checked that ([2]) is equivalent to 



i))4 



Therefore, 

Pr{p > p + Ea} < - 

for < n < ^. 

On the other hand, since 5a < — < i, by Lemma [S] and Lemma [31 we have 

Pr{p < p - Ea} < exp In g ( -Ea, — j j < exp in g (Ea 
for <p < ^. Hence, by ([7]), 

A A 
Pr{|p -p\> Ea, \p-p\> Erp} < 2 "^ 2 " '^' 

This proves ([T]) for < p < — . 
For §^ < p < 1, we have 

Pv{\p - p\ > Ea, \p - p\ > ErP} = Pr{\p - p\ > ErP} 

= Fr{p > p + Erp} + Pr{p <p — Erp}. 
Invoking Lemma [U we have 

Pr{p >p + Erp] < exp in g yEa, — \\ ■ 

On the other hand, 

Pr{p <p- Erp] < exp(n g{-ErP,p)) <exp(ng( -Ea, — j j < exp f n 5 f e^, — j j 

where the first inequality follows from Lemma [U the second inequality follows from Lemma HI 
and the last inequality follows from Lemma [31 Hence, 

Y'i{\p-p\ > Ea, |p-p| > Erp} < 2 exp f n 5- f 6^, — j j <5. 

This proves ([T|) for — < p < 1. The proof of Theorem 1 is thus completed. 



B Proof Theorem 2 



We need some preliminary results. We shall introduce functions 



and 



for < z < 1 and < p < 1. 



Jif(z,p) = z ^(z,p) 



Lemma 7 Suppose 1 <r < M < N . Then, 

Prj- < {l-e)p\ < (l-e)-'' 



exp 



er 
l-e 



Proof. Clearly, 



Pr<!- < (l-e 
n 



)p} 



Pr <^ n > 



(1 - e)p 
Pr{n > m} 



where 



m 



(1 - e)p 
It can be seen that there exists a real number e* G (0, 1) such that £* > e and 



(1 — e*)p 



(1 - e)p 



Now let Km, be the number of units having a certain attribute among m units drawn by a sampling 
without replacement from a finite population of size A^ with M units having the attribute. Then, 

Pr{n > m} = Fi{Km < r} 

= f4—<- 

{ m m 

= pJ:^<(i_£*)p 

[ 771 

Applying the well-known Hoeffding inequality [7] for the case of finite population, we have 

Pr <i — — < {1 — e*)p\ < exp{mJ^{p — e*p,p)) 
m I 



exp 



(1 — e*)p 
= exp {r^{p — e*p, p)) 

< eyip{r^{p — ep,p)) 



J^{p-e*p,p) 



where the last inequahty follows from e* > e and the monotone property of ^(p — ep,p) with 
respect to e, which has been established as Lemma 5 in [^. 

From the proof of Lemma 6 of [5], we know that ^{p — £p,p) is monotonically decreasing 
with respect to p € (0, 1). Hence, 

Pr < — < (1 — e)p > < exp {r^{p — ep,p)) < lim exp {r^{p — £p,p)) = (1 — e)~'^ exp ( 

In ) p^o \ 1 — e 

The proof of the lemma is thus completed. 



Lemma 8 Suppose 1 <r < M < N and p + ep < 1 . Then, 



Proof. It is clear that 



Pr{^ > (1 + e)p} < (1 +£)-•' exp ('y^') , 



Pr{n < m} 



D 



where 



m 



Xl + e)p_ 
It can be seen that there exists a real number e* € (0, 1) such that £* > e and 



{l + e*)p 



{l+e)p_ 



Now let Km be the number of units having a certain attribute among m units drawn by a sampling 
without replacement from a finite population of size N with M units having the attribute. Then, 

Pr{n < m} = 'Pi{Km > t} 

y m m 

= pJ^>{l + e*)p 
I m 

Applying the well-known Hoeffding inequality [7] for the case of finite population, we have 

Pr <! — > (1 + £*)p\ < exp {mJ^{e*p,p)) 
m J 



exp 



■^{e*p,p) 



[l + e*)p 
= exp{r^{p + e*p,p)) 

< exp{r^{p + ep,p)) 



where the last inequahty follows from e* > e and the monotone property of ^(p + ep,p) with 
respect to e, which has been established as Lemma 5 in [^. 

From the proof of Lemma 6 of [5], we know that ^{p + £p,p) is monotonically decreasing 
with respect to p € (0, j^ j . Hence, 



Pr < — > (1 + e)p > < exp {r^{p + £p,p)) < lim exp {r^{p + £p,p)) = (1 + e) ^' exp I - 
The proof of the lemma is thus completed. 



er 



+ £ 



a 



Now we are in a position to prove Theorem 2. We shall consider the following cases: 
Case (i): M < r; 
Case (ii): M = N; 



Case (iii 
Case (iv 
Case (v) 
Case (vi 

In Case 

In Case 

In Case 

In Case 



: r = N; 

: 1 <r < M < N andp < j^; 

1 <r < M < N andp = j^; 

: 1 <r <M < N and p > j^. 

i), we have n = N and k = M. Hence, p = p and Pr {\p — p\ > ep} = < =S(e, r) 

ii), we have p = p and Pr {\p — p\ > ep} = < =S(e, r). 

iii), we have p = p and Pr {|p — p| > ep} = < =S(e, r). 

iv), we have k = r and, by Lemma 7 and Lemma 8, 

FT{\p-p\>ep} = Pr|- < (l-e)p| +Pr|- > (l+e)p| 



< (1 -£)"'■ exp 
= ^(e,r). 



er 
1-e 



+ (l + e) ''exp 



er 

l + e 



In Case (v), we have k = r and 

Pr{\p-p\>ep} = Prj— < (1 -e)p| +Pr|— > (l + e)p| 
= Prj— < (1 -e)p| +Pr{fc = n = r}. 

Notice that 



Pr {fc = n = r} 



(r) 



N^ < \ N 



O 



MV 



1 \ f er 

< (1 + £)"'■ exp 



l + e 



l + e 



10 



as a result of M < N. Therefore, by Lemma 7, 



Pi{\p-p\ >£p} < (l-e)"'"exp(--^^ J +(l+e)~''exp(-^^ J =^(e,r) 

In Case (vi), we have k = r, Pr{^>(l + e)p| = and, by Lemma 7, 

Fr {\p - p\ > ep} = Prj— < (1 -e)p| +Pr|— > (l + e)p| 
= Pr|-<(l-e)p} 



n 

er 



< (l-e)-^exp( --^ ) <^(e,r). 



So, we have shown Pr {|p — p| > ep} < ■S(e, r). The other statements of Theorem 2 have been 
estabhshed in [5]. 

This concludes the proof of Theorem 2. 
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